DeepTrio: a ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks

Abstract Motivation Protein–protein interaction (PPI), as a relative property, is determined by two binding proteins, which brings a great challenge to design an expert model with an unbiased learning architecture and a superior generalization performance. Additionally, few efforts have been made to allow PPI predictors to discriminate between relative properties and intrinsic properties. Results We present a sequence-based approach, DeepTrio, for PPI prediction using mask multiple parallel convolutional neural networks. Experimental evaluations show that DeepTrio achieves a better performance over several state-of-the-art methods in terms of various quality metrics. Besides, DeepTrio is extended to provide additional insights into the contribution of each input neuron to the prediction results. Availability and implementation We provide an online application at http://bis.zju.edu.cn/deeptrio. The DeepTrio models and training data are deposited at https://github.com/huxiaoti/deeptrio.git. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Various kinds of biological macromolecule interactions, especially protein-protein interactions (PPIs) (Jones and Thornton, 1996), play a fundamental role in biological information exchange, energy production and material transportation. A number of highthroughput and low-throughput experimental approaches, like yeast-two-hybrid purification followed by mass spectrometry (Lage, 2014), affinity capture-western, cocrystal structure analysis, bimolecular fluorescence complementation and biochemical modification analysis (Oughtred et al., 2019), have been leveraged to identify PPIs. Thus, a tremendous number of PPIs have been identified and used to construct PPI databases, such as DIP (Salwinski et al., 2004;Xenarios et al., 2002), BioGRID (Oughtred et al., 2019;Stark et al., 2006) and STRING (Szklarczyk et al., 2019), which makes it possible to identify PPIs in silico instead of the time-consuming and labor-intensive experimental methods.
Traditionally, protein 3D structure has been regarded as an essential profile for PPI prediction. However, with the discovery of intrinsically disordered proteins whose spatial structures interconvert on a series of timescales (Uversky et al., 2008), the protein 3D structure is no longer regarded as the only determinant of PPIs, and that the protein primary structure may offer more clues for PPI prediction. Since the protein sequence can be easily obtained by many inexpensive and time-saving experimental technologies or directly inferred from gene sequences, it has become the most accessible type of protein profiles. Currently, a variety of protein properties can be predicted using the protein sequences. Some of them only depend on the protein itself like solubility (intrinsic property), while others require the information from another object like PPI (relative property). However, there are few existing prediction methods consider PPI as a relative property.
Many sequence-based machine learning methods have been developed for PPI prediction, such as Guo's work (Guo et al., 2008), Wang's work (Wang et al., 2018), DPPI (Hashemifar et al., 2018), DNN-PPI (Li et al., 2018), DeepFE-PPI (Yao et al., 2019) and Protein-Protein Interaction Prediction Based on Siamese Residual RCNN (PIPR) (Chen et al., 2019). Guo's work (Guo et al., 2008) curates seven physicochemical properties of amino acids (such as hydrophobicity, polarity and volumes of side chains) as protein feature descriptors. Each protein sequence is represented as seven vectors according to these descriptors. For a given protein sequence, auto covariance (AC) variables are used to describe the average interactions between residues throughout the whole sequence, and in downstream analysis, a support vector machine (SVM) (Cortes and Vapnik, 1995) is leveraged to determine whether the given proteins interact. DPPI (Hashemifar et al., 2018) utilizes PSI-BLAST (Altschul et al., 1997) to construct a comprehensive protein representation. DPPI incorporates a random projection module into the convolutional neural network (CNN) architecture, which projects the protein representations learned by the convolutional layers to two different vector spaces. The random projection module can help the model learn about the interaction potential of two input proteins. Finally, a linear transformation unit computes a probability value indicating whether two proteins interact in the prediction module. DeepFE-PPI (Yao et al., 2019) exploits a novel residue representation method, Res2vec, to embed protein sequences, which may describe more precisely residue-residue interactions and supply more effective information for the downstream model. DeepFE-PPI employs the deep neural networks (DNN) as the learning architecture, and uses both a batch normalization module and a dropout module to prevent over-fitting. PIPR (Chen et al., 2019) uses a pretrained semilatent vector to represent amino acids for capturing their contextual similarity and physicochemical properties. PIPR employs a residual recurrent convolutional neural network (RCNN) as the model architecture, and achieves the state-of-the-art performance for PPI prediction. In addition, PIPR is extended to contain three independent models for different application scenarios involving PPI prediction, interaction type prediction and binding affinity estimation.
Although a growing number of PPI predictors have been proposed in recent years, there remains some room for improvement: (i) it can be beneficial for prediction if a model can consider PPI as a relative property rather than an intrinsic property; (ii) few efforts have been made to provide an intuitive description of the inner mechanism of pairwise-input neural networks and illustrate the effect of each amino acid residue on PPI.
In this paper, we propose DeepTrio, a deep-learning framework based on a mask multiscale CNN architecture, in which multiple parallel filters provide valuable insights for PPI prediction by apprehending the multiscale contextual information of protein sequences. In comparison to existing tools, the main contributions of our work are: (i) an additional class, single-protein class, is introduced to our model, which allows DeepTrio to discriminate between the relative property and intrinsic property; (ii) due to the application of the single-protein class and masking operation, DeepTrio requires only one training set to build a model that can not only identify PPIs, but also further investigate the effect of each protein residue on PPI without any additional specific training; (iii) DeepTrio is also available as an online tool for inexperienced users in order to address the cross-platform usage and dependency related issues.

Materials and methods
Since PPI prediction is a binary classification task, most of the existing models are trained to classify the input data into two classes: interacting or noninteracting. However, we have designed DeepTrio for ternary prediction that takes as input a pair of protein sequences, and generates a three-dimensional vector output indicating the probability of interaction, noninteraction and single-protein. The overall framework of DeepTrio is illustrated in Figure 1a. DeepTrio also employs a Siamese architecture, which involves two identical subnetworks sharing the same configuration and weights, to ensure that two input sequences are represented and analyzed equally. In addition, DeepTrio can calculate the importance score for each residue by using the masking method.

Data collection
There are four datasets used for training and testing the models in this study. Two datasets are derived from the Biological General Repository for Interaction Datasets (BioGRID) (Oughtred et al., 2019), and the other two datasets are derived from the database of interacting proteins (DIP) (Salwinski et al., 2004;Xenarios et al., 2002).

BioGRID multivalidated physical interaction data
The BioGRID database (Oughtred et al., 2019) is a comprehensive, specialized database for PPIs derived from multiple major species, whose multivalidated physical interaction subsets curate PPIs according to the criteria by which the interacting pairs must be validated in at least two different experimental systems or two different publication sources. Since the Saccharomyces cerevisiae (yeast) and Homo sapiens (human) data are widely used to evaluate the performance of PPI predictors (Chen et al., 2019;Guo et al., 2008;Hashemifar et al., 2018;Yao et al., 2019), we use the human and yeast multivalidated physical interaction datasets in BioGRID as the benchmarks for training and evaluating. The protein sequences are retrieved from the UniProt (UniProt Consortium, 2019) and restricted in length to a minimum of 150 and a maximum of 1500 residues. The human dataset involves 7705 proteins forming 31 164  The strategy for constructing BioGRID negative datasets. Given an interacting protein pair S A and S B , we randomly choose one protein (e.g. S B ) from them, and then shuffle its sequence with 2-let counts (excluding the first residue) to get a novel protein S 0 B . A negative sample is generated by pairing S A and S 0 B positive cases and the yeast dataset contains 3553 proteins forming 13 462 positive cases. Following the same strategy as PIPR, we use CD-HIT (Fu et al., 2012;Li and Godzik, 2006) to decrease sequence redundancy of the datasets, in which two PPIs are considered similar if they share a sequence identity greater than 40%.

Negative Set Construction
The negative samples in these two benchmarks are generated by shuffling one sequence of a positive case with 2-let counts (excluding the first residue of the protein) (Fig. 1c). It has been demonstrated that the possibility of interaction can be deemed negligible if a sequence of one interacting pair is shuffled (Kandel et al., 1996). Additionally, the shuffled sequence retains the same amino acid composition and approximately the same di-peptide frequencies as the original sequence.

Saccharomyces cerevisiae core data
The S.cerevisiae core dataset, as a widely used benchmark, is composed of 11 188 PPI cases including 5594 positive cases proposed by Guo et al. (2008) and a heterogeneous set of 5594 negative cases according to different papers. The positive cases are selected from the DIP database (Salwinski et al., 2004;Xenarios et al., 2002), where proteins shorter than 50 amino acids and sharing !40% sequence identity are removed. The negative cases in these datasets are generated by randomly pairing the proteins without obvious evidence of interaction. However, there are some differences between the S.cerevisiae positive sets from DeepFE-PPI and PIPR, so we use both of the S.cerevisiae datasets to train and test DeepTrio and other baseline approaches.

Single-protein data
The single-protein case consists of two components: a normal protein sequence and a masked sequence whose all residues are masked by blank bits (Fig. 1b). Each unique sequence in the positive and negative datasets corresponds to one case in the single-protein set. This set is designed for relieving the obscure influence caused by the relative property and preventing the potential weight polarization in the intermediary layers. The way we train single-protein data are the same as the positive and negative cases. Note that this set is only used for training DeepTrio, and does not participate in the evaluation for DeepTrio.

Protein feature encoder
DeepTrio employs a Siamese architecture with the multiple parallel convolution (multiscale convolution) module to capture various protein features in multiscale windows. It takes as input a protein pair (X, X 0 ), and yields two protein representations (H conc: ; H 0 conc: ) for downstream analysis (Fig. 2).

Single-protein data
The input protein sequence is projected into a sparse orthonormal vector space by performing one-hot encoding transformation in the input module. For two input proteins S A and S B , each of them is transformed into a binary matrix X 2 R LÂ23 as follows: ; where x i 2 R 1Â23 (i ¼ 1; 2; . . . ; L) is a binary vector of length 23 (22 for the proteinogenic amino acids and 1 for the mask bit) corresponding to the i th amino acid residue in a sequence, and L is fixed to 1500. A trainable embedding weight matrix W e 2 R 23Âd (optimized by backpropagation) is used to map X to a dense continuous vector space by the following equation: where E 2 R LÂd is the embedded representation of one input protein and d is the feature dimension of the amino acid symbol lexicon.

Masking module
A Boolean matrix, B 2 R LÂ1 , will be attached to the embedded representation E in this module, which eliminates the masked residues from the calculation in the downstream modules. This operation will be called in three scenarios (Fig. 1b): i. The length of protein sequences is fixed to 1500. Thus, the shorter sequences will be padded with mask bits. ii. In the single-protein case, the whole sequence of one of the proteins is masked by mask bits. Thus, there is only one protein participating in the calculation of the deep-learning model when the single-protein case is inputted. iii. When DeepTrio investigates the effect of a particular residue b i on PPI, a mask bit will be attached to this residue, which blocks the calculation of b i in the downstream layers.

Multiple parallel convolutional module with pooling
The embedded representation E is analyzed by N parallel convolution filters with M n (n ¼ 1; 2; . . . ; N) kernels (Fig. 2). Each convolution filter extracts a certain specific aspect of protein profiles and outputs as follows: where l n and s n denote the length of the convolution window and stride in the n th convolution filter, respectively. The output T ðnÞ k;m (k ¼ 1; 2; . . . ; LÀln sn þ 1) is the m th interior element in the k th row of the n th convolution filter, v ðm;nÞ i;j is the j th interior element in i th row of the m th kernel in the n th convolution filter, and E iþk;j is the j th interior element in ði þ kÞ th row of the embedded matrix E. Note that the bias calculation is not applied to the convolution calculation.
The filter outputs are activated by the rectified linear unit (ReLU) (Xu et al., 2015) and yield a set of feature maps, , which are calculated as follows: k;m is the m th interior element in the k th row of A ðnÞ . After obtaining these feature maps, a global max-pooling operation is performed for reducing the dimension of feature maps and highlighting the most significant features. The max-pooling output H ðnÞ 2 R 1ÂMn (for the n th convolution filter) is given by where h ðnÞ m is the m th element of H ðnÞ . Next, we flatten and concatenate all the H ðnÞ (n ¼ 1; 2; . . . ; N) to get a new row vector H conc: 2 R 1ÂN :

Prediction and learning objectives
Two max-pooling outputs generated by the aforementioned modules are first merged into one vector, and then passed into the dense layers to calculate the probability value for PPI. The learning architecture is trained to optimize the cross-entropy loss between predictions and targets by backpropagation with AMSGrad algorithm (Reddi et al., 2019).

Prediction module
Two max-pooling outputs, H A conc: and H B conc: , given by the two subnetworks (sharing the same configuration and weights), are combined via element-wise addition and transformed into a merged vector H merged 2 R 1ÂN . Compared with the element-wise multiplication, the addition operation prevents H merged being a zero-vector when the single-protein case is inputted. The merged vector H merged is first passed through two dense layers, and then normalized by the softmax function as follows: where W f1 2 R NÂf1 , W f2 2 R f1Â3 are the weight matrices of the first and the second dense layers, respectively. The i th dimension of c 2 R 1Â3 corresponds to the confidence score, c i 2 ½0; 1, of the i th class.

Learning objective
For a given protein pair p, its class label y p is defined as The learning model is trained to minimize the following crossentropy loss and classify the inputs into their corresponding classes correctly where CEE is the cross-entropy error function, c p i and y p i represent the i th scalar components of the model prediction c p and its corresponding class label y p , respectively, and Z is the number of inputs in a batch.

Optimization strategy
We adopt AMSGrad (Reddi et al., 2019), a variant of Adam optimizer (Kingma and Ba, 2014), to optimize the cross-entropy loss of our learning model. Following the same strategy as PIPR, the learning rate a is set to 0.001, and the exponential decay rates b 1 and b 2 are set to 0.9 and 0.999, respectively.

Hyperparameter tuning
The hyperparameter searching space of our model consists of 13 dimensions (including the hyperparameters for the embedding dimension, dropout rates, convolution kernel lengths, convolution strides and optimizers), which form about 140 000 combinations (Supplementary Table S1). It is too large for the grid search algorithm to find the optimal combination. Therefore, we leverage a Bayesian tuning tool GpyOpt (The GPyOpt Authors, 2016) to optimize the search process, which has been proved to be more efficient than the randomized grid search (Wang et al., 2019). For the optimization program GpyOpt, we set the number of initial random searching points and the maximum number of iterations to 10 and 50, respectively. The performance of all candidate models and their corresponding hyperparameter settings are listed in Supplementary  Table S2.

Implementation details
We randomly initialize the weights of the embedding, convolution and dense layers according to the Glorot uniform distribution (Glorot and Bengio, 2010), which is a common strategy used by deep-learning methods for model initialization (Kulmanov et al., 2018;Seo et al., 2018;Wang et al., 2020). We design DeepTrio based on the open-source TensorFlow 2.0 library (Abadi et al., 2016), and implement training and evaluation for all baseline models using a NVIDIA Tesla P100 GPU with 16 GB of memory.

Calculating the effect of protein residues on prediction
Suppose we have a pair of interacting proteins

Results
We report the performance of DeepTrio and other approaches on four different PPI datasets. Further, we test the performance of Those models are retrained using the same data. Those models are retrained using the same data.  DeepTrio on the multiple specie dataset where proteins are filtered based on different thresholds of sequence identity. In addition to the binary prediction of PPIs, DeepTrio can generate an intuitive protein portrait for the detection of potentially important residues for interaction. Lastly, a logically concise online application has been developed to help researchers make better use of DeepTrio.

Performance comparison of DeepTrio with other approaches
The main task of DeepTrio is to estimate the interaction probability of a given protein pair based on its sequences. We compare DeepTrio with several state-of-the-art PPI prediction methods including SVM-AC (Guo et al., 2008), SVM-MCD (You et al., 2014), DPPI (Hashemifar et al., 2018), PIPR (Chen et al., 2019) and DeepFE-PPI (Yao et al., 2019) on a variety of benchmark datasets. Furthermore, we also report the performance of a simplified variant of DeepTrio (named as DeepDuo), which has the same learning architecture as DeepTrio but is not trained by the single-protein dataset. By setting the simplified control of DeepTrio, we can further investigate how the single-protein cases influence the prediction performance of our model.

BioGRID multivalidated physical interaction data
We perform 5-fold cross-validation on the BioGRID human and yeast datasets. Under this setting, the data are equally divided into five parts and each part has an equal chance to train and test the models. We aggregate eight quality metrics including accuracy, precision, sensitivity, specificity, F1 score, Matthews correlation coefficient (MCC) and average precision (AP) to assess the prediction performance of the models. Higher values in all these metrics indicate better performance.
As shown in Tables 1 and 2, the RCNN architecture of PIPR promises a remarkable performance and gets the highest scores in sensitivity on both the human and yeast datasets. However, DeepTrio achieves the best performance in other metrics by leveraging a multiscale convolution architecture that can better learn the deep features from protein sequences. For example, DeepTrio outperforms PIPR by 0.52% and 1.79% in accuracy, and by 1.43% and 4.34% in precision on the human and yeast datasets, respectively.
In addition, we report the comparison between DeepDuo and DeepTrio on the BioGRID benchmarks, which provides insights into the role of single-protein training in PPI prediction. It is observed that DeepTrio perform consistently better than DeepDuo in all of the evaluation metrics (Tables 1 and 2). For example, DeepTrio attains an accuracy value of 97.55% (which is 0.49% higher than DeepDuo), and an MCC value of 95.15% (which is 1.01% higher than DeepDuo) in the yeast dataset. These results suggest that the single-protein training process can improve our model performance on the BioGRID datasets.

Saccharomyces cerevisiae core data
We first use DeepFE-PPI's S.cerevisiae dataset to evaluate the performance of DeepTrio. The positive set from DeepFE-PPI is identical with that from You et al. (2015). To make the data suitable for the model input, we remove 255 cases that contains proteins longer than 1500 amino acids, and use the truncated data to retrain and evaluate DeepTrio and PIPR. The evaluation shows that, under the highest scores attained by DeepFE-PPI on its own data, DeepTrio achieves better performance than PIPR with respect to five evaluation metrics (Table 3). Second, we test the performance of DeepTrio and DeepFE-PPI on PIPR's dataset, where we remove 231 cases containing proteins longer than 2000 amino acids. The results in Table 4 show that DeepTrio attains better performance than DeepFE-PPI (such as 3.74% higher in accuracy, 8.04% higher in precision and 7.44% higher in MCC) on PIPR's dataset. However, PIPR achieves the state-of-the-art performance on its own dataset, but exhibits worse performance than DeepTrio in precision and specificity. In addition, DeepTrio also outperforms DeepDuo on both of the S.cerevisiae datasets in most metrics (Tables 3 and 4). Detailed performance of DeepTrio, PIPR and DeepFE-PPI on two S cerevisiae datasets is provided in the Supplementary Material.

Comprehensive comparison between DeepTrio and PIPR
Based on the four datasets mentioned above, we count how many times DeepTrio or PIPR attains higher scores with respect to six metrics. Table 5 shows that DeepTrio offers robust performance over the four datasets and outperforms PIPR in many evaluation metrics, especially in precision and specificity.

PPI prediction on multispecies dataset
Following the same strategy as PIPR (Chen et al., 2019), we perform 5-fold cross-validation of DeepTrio on the multispecies dataset (Caenorhabditis elegans, Escherichia coli and Drosophila melanogaster), where proteins are filtered based on different thresholds of sequence identity (40%, 25%, 10% and 1%). To make the data suitable for the model input, we also remove the cases containing proteins longer than 1500 amino acids. The results in Table 6 show that DeepTrio performs consistently well on a series of datasets with different sequence identities.

PPI prediction on independent test set
Here, we use the virus-human interaction dataset in Liu-Wei et al. (2021) as an independent test set to assess the performance of DeepTrio and other approaches (trained by the BioGRID humanhuman interaction dataset). Following the preprocessing methods in the previous studies (Hashemifar et al., 2018;Khurana et al., 2018;Rawi et al., 2018), we first decrease sequence redundancy in the virus protein data with a maximum sequence identity of 10%. Second, we exclude all the virus sequences in the independent test set with a sequence identity of !25% to any sequence in the human-human interaction training set. The negative independent test data are generated by randomly shuffling the protein sequences in the virus-human interaction dataset (this method is elaborated in Section 2.1.1). The final independent test set is composed of 8929 interacting and 8929 noninteracting protein pairs. The results in Figure 3 show that DeepTrio exhibits competitive performance on the independent test set in comparison to PIPR.

Detecting and visualizing potentially important residues for interaction
Since experiment-based methods require meticulous operations and lots of time to identify the important sites for interaction, it is crucial to conduct a prior assessment of experimental protocols and prereject initial targets with the lowest interaction probability. Thus, we extend DeepTrio to an additional scenario that helps detect the potentially important sites for interaction (which are not limited to the residues in core binding regions, but also include some other crucial residues that shape the external and internal structures, provide skeleton support through long aliphatic side chains or create the hydrophobic environment). The main goal of this extension is to find out which residues take the main responsibility for the prediction results and visualize the importance score for each residue in a sequence.
Recently, a handful of previous works have already applied several visualization techniques to provide interpretable explanations for deep-learning models. DeepBind (Alipanahi et al., 2015) uses 'mutation maps' to illustrate the effect that each possible point mutation may have on binding affinity between DNA and proteins. DeepChrome (Singh et al., 2016) utilizes a network-centric approach (Yosinski et al., 2015) to extract the class-specific feature patterns that are highly influential in gene expression predictions. DeepSig (Savojardo et al., 2018) employs the deep Taylor decomposition approach (Montavon et al., 2017) to determine a relevance score measuring the contribution of each input neuron toward the prediction. In this work, owing to the integration of the singleprotein training strategy and masking operation, it is possible to allow DeepTrio to visualize the contribution of each input neuron toward the prediction (which is elaborated in Section 2.4).
We validate the visualization results given by DeepTrio (the model is trained using the BioGRID human multivalidated physical interaction data) with the recent experimental evidence in biochemical studies. Note that all the PPIs mentioned below, along with their mutants, are not included in the training data of DeepTrio. Figure 4a shows the importance map of the mutant human calreticulin (CALR) (that loses most of the C-terminal acidic residues and gains a novel common C-terminus with 36 amino acids rich in positively electrostatic charges caused by a heterogeneous set of þ1 bp frameshift mutations in exon 9) (Nangalia et al., 2013). These  An 'importance map' are employed to visualize the effect of each amino acid residue on interaction, where residues in red colors exert positive effects and those in blue colors exert negative effects on prediction. (a) Analysis of the potential importance of each residue in CALR p. L367fs*46 for interaction with MPL. The positively charged residues in the last 36 amino acids exhibit a strong trend of higher importance scores, which have been proved essential for the physical interaction between CALR p. L367fs*46 and MPL. (b) Analysis of the potential importance of each residue in ChoKa for interaction with the SH3 domain of c-Src. The poly-proline region in ChoKa residues 53-78 harbors relatively higher scores in the importance map, which are reported crucial for the interaction with the SH3 domain of c-Src (Kall et al., 2019) positively charged residues in the novel C-terminus are reported essential for mediating the erroneous activation of MPL signaling and the physical interaction between mutant CALR and the thrombopoietin receptor MPL, which can lead to myeloproliferative disorders (Elf et al., 2016(Elf et al., , 2018. We use the 'importance map' to illustrate the importance score of each residue in the mutant CALR (p.L367fs*46) (Fig. 4a). The 'importance map' is rendered as a heat map with l squares (where l is the length of the given protein), and each line in the heat map is set to 20 squares. It can be observed in Figure 4a that most of the residues with crimson backgrounds are enriched in the C-terminus, where the positively charged residues (like arginine and lysine) exhibit a strong trend of higher importance scores. These results are basically consistent with the previous findings in experimental studies (Elf et al., 2016(Elf et al., , 2018. Figure 4b depicts the importance map of Choline kinase alpha (ChoKa). ChoKa catalyzes the phosphorylation of choline to phosphocholine, and its high expression has proven to be associated with cancer malignancy and poor patient prognosis (Ramírez De Molina et al., 2002, 2005. Recent biophysical and biochemical studies (Kall et al., 2019) have demonstrated that the ChoKa poly-proline region in residues 49-79 (especially prolines 61 and 62) mediates the physical interaction between ChoKa and the SH3 domain of c-Src tyrosine kinases. It can be seen in the ChoKa importance map (Fig. 4b) that the highly scored residues are enriched in the N-terminal poly-proline region, which is consistent with the findings in the aforementioned experimental studies.
In practice, the importance map shows a preference for finding the key residues that share similar properties in the adjacent regions and a sensitivity decrease for large protein assessment. Another noteworthy observation in both Figures 4a and 4b is that the vast majority of the negative-effect residues harbor the pale-blue backgrounds, which can be explained by the hypothesis that most of point mutations will reduce the interaction between two proteins that have already reached the optimal conformation for binding.

Online server
To provide an accessible interface in a logically concise manner, we develop an online application based on the DeepTrio model. The PPI prediction results and importance maps can be easily obtained by submitting two protein sequences to the web server. Moreover, the results from multiple submissions will be recorded on the web page, and they can be conveniently filtered and downloaded from the website. This online application is available at http://bis.zju.edu. cn/deeptrio.

Conclusion
With the development of deep-learning algorithms such as CNN (LeCun and Bengio, 1995), recurrent neural networks (Hochreiter and Schmidhuber, 1997) and graph neural networks (Scarselli et al., 2009), an increasing number of sequence-based deep-learning methods have been developed for PPI prediction. A state-of-the-art approach, PIPR, adopts an RCNN architecture to capture the local features and contextualized information and has achieved remarkable performance, whereas it does not provide a convenient implementation for inexperienced users and a visualization method to make the model interpretable. However, DeepTrio provides a superior prediction for PPI and an intuitive visualization for the importance of each protein residue in both online and offline implements. Besides, a variety of experimental evaluations show that the additional single-protein training indeed improves the performance of PPI prediction by inherently preventing weight polarization. For future work, a possible direction is to incorporate molecular docking calculation into DeepTrio for more accurate prediction of key regions for PPI. We also explore the possibilities of using dynamic visualization techniques to interpret our model better.
In summary, we propose a deep-learning-based model, DeepTrio, to predict PPIs using raw protein sequences. By adopting the multiple parallel convolution filter architecture that allows DeepTrio to capture the deep features from the protein profiles, our model achieves encouraging performance on the benchmark datasets in terms of various evaluation metrics. We also integrate the single-protein training strategy and masking operation to prevent weight polarization in the intermediary layers and enable DeepTrio to visualize the contribution of each protein residue to the prediction results. Furthermore, we also provide an online application for PPI prediction and important residue detection.